Skip to content

Add internal profiling to debug tool #32423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 15, 2025

Conversation

SangJunBak
Copy link
Contributor

@SangJunBak SangJunBak commented May 6, 2025

See commit messages for details

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@SangJunBak SangJunBak force-pushed the jun/debug-tool/add-internal-profiling branch 3 times, most recently from 754036d to a2867a8 Compare May 7, 2025 20:20
@SangJunBak SangJunBak requested a review from ParkMyCar May 13, 2025 07:09
@SangJunBak SangJunBak changed the title WIP Add internal profiling to debug tool May 13, 2025
@SangJunBak SangJunBak marked this pull request as ready for review May 13, 2025 14:34
@SangJunBak SangJunBak requested a review from a team as a code owner May 13, 2025 14:34
Copy link
Contributor

@kay-kim kay-kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs part lgtm

Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

@@ -1,7 +1,7 @@
[package]
name = "mz-debug"
description = "Debug tool for self-managed Materialize."
version = "0.1.0"
version = "0.2.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bumping the version like this is a good habit, but out of curiosity is it necessary? do we publish it anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, just something manual I do whenever I decide to publish haha. We do publish it as a curlable binary that we expose via the docs!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And technically one can download old versions, though we use the version via a git tag, so it never actually reads this cargo.toml. We have instructions to update it here /Users/sangjunbak/materialize/ci/deploy_mz-debug/README.md but kinda upto the developer to make sure both are in sync 😬

static INTERNAL_HOST_ADDRESS: &str = "127.0.0.1";
static INTERNAL_HTTP_PORT: i32 = 6878;

/// A struct that handles downloading and saving profile data from HTTP endpoints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we always try to end comments with periods, in other words, make them complete sentences.

output_path: &Path,
) -> Result<(), anyhow::Error> {
// Try HTTPS first, then fall back to HTTP if that fails
let mut url = format!("https://{}", relative_url);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of manually formatting the URL like this I would opt to use the url crate and then the set_scheme(...) method on the Url type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think in order to create a URL, we need to use Url::parse which expects an absolute URL. Converted it into a URL anyways via Url::parse(&format!("https://{}", relative_url)) then set the scheme to http later but lemme know if there's a better way!


/// Downloads and saves heap profile data
pub async fn dump_heap_profile(&self, relative_url: &str, service_name: &str) -> Result<()> {
let output_dir = self.context.base_path.join("profiles");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these directory names should be constants that are grouped at the top of the file? That way it's easier to determine what folders the debug tool might create? Not blocking but just a thought

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah like saving profiles or self.context.base_path.join("profiles")? I think saving paths like profiles in static str variables like RELATIVE_DIR makes total sense.

The latter might be an issue since base_path has format mz-debug_<DateTime> where DateTime gets computed on main at the start of the program. We could have it done as a constant then have internal_http_dumper and everything alike refer to it, but I like the flexibility of having the DateTime be computed on runtime in case we:
a) Do multiple runs with multiple date times (i.e. mz-debug/<DateTime>)
b) We do some initialization that takes a while and wanna compute DateTime later

- Instead of waiting until port forwarding, we read from stderr until we know for sure port forwarding has started
- Renamed `create_kubectl_port_forwarder` to `create_pg_wire_port_forwarder` since we'll be port forwarding more than once
- Removed the retry mechanism given we don't see the port forwarding disconnecting too often and we can't do it AND initial acknowledgement together easily
- Tie child process to a struct instead of a tokio spawn.
We create an initialization function that preprocesses as much as possible and maps args to context values.
I kept the trait but realized the enum is a bit redundant, especially since we have the Context enum
Given we're always creating a PathBuf from start time, it makes sense to just pass the PathBuf itself
- Use stdout instead of stderr
- Remove sharing of context
- Fix bug where process was being killed on connecting to port forward
- Introduced InternalHttpDumpClient for downloading and saving heap profile data and Prometheus metrics from internal HTTP endpoints.
- Changes service for port forwarding from balancerd to environmentd for better consistency with metrics
- Refactor "find" methods for port forwarding to be more modular. Separates out port finding / port forwarder creation from service finding
- Introduced a new asynchronous function `get_container_ip` to fetch the IP address of a Docker container using its ID. Doing this means we don't need the user to map ports when doing `DOCKER RUN`
- Changes mz_connection_url for the emulator to an optional argument and global now that we have the container's IP
- Update CLI flags in docs and clippy/lint errors
@SangJunBak SangJunBak force-pushed the jun/debug-tool/add-internal-profiling branch from 8a7573a to a79040c Compare May 15, 2025 03:06
- Introduced constants for directory paths in `docker_dumper.rs`, `internal_http_dumper.rs`, and `system_catalog_dumper.rs` for readability
- Updated the output file naming convention for heap profiles in `internal_http_dumper.rs` to use a `.gz` extension since they're really gzip files
- Added `url` crate for URL parsing.
- Updated warn messages in to print context for improved debugging
@SangJunBak SangJunBak merged commit 4487749 into main May 15, 2025
86 checks passed
@SangJunBak SangJunBak deleted the jun/debug-tool/add-internal-profiling branch May 15, 2025 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants